Influence, originality and similarity in directed acyclic graphs

نویسندگان

  • Stanislao Gualdi
  • Matus Medo
  • Yi-Cheng Zhang
چکیده

We introduce a framework for network analysis based on random walks on directed acyclic graphs where the probability of passing through a given node is the key ingredient. We illustrate its use in evaluating the mutual influence of nodes and discovering seminal papers in a citation network. We further introduce a new similarity metric and test it in a simple personalized recommendation process. This metric’s performance is comparable to that of classical similarity metrics, thus further supporting the validity of our framework. The past two decades have witnessed a network revolution [1] fueled by the ever-increasing computer computational power at our disposal and by the availability of rich datasets mapping virtually all fields of human activity [2, 3]. Complex networks and algorithms based on these resources found their application in the most diverse fields, ranging from nonlinear dynamics and critical phenomena [4,5] to social and economic systems [6]. Random walks are among the most prominent classes of processes taking place on networks, being employed in importance rankings for the World Wide Web [7], recommender systems [8], disease transmission models [9], nodes similarity [10] and many other areas [11]. A relatively less-studied class of networks is represented by directed acyclic graphs (DAGs) which occur in both natural and artificial systems. Their acyclicity (absence of directed cycles) stems either from an implicit time ordering (as in citation networks where only past papers can be cited) or from natural constraints (as in food webs). Even when nodes of a DAG do not have time stamps attached, a causal structure with all edges pointing from later to earlier nodes can always be recovered. Theoretical models exist for building random DAGs with fixed degree sequences or with fixed expected degrees [12, 13]. Acyclicity turns out to be highly advantageous to filter information through a random walk process. If we consider a random walk on a generic network, the probability of passing through a given node—which we refer to as passage probability—is usually not a meaningful quantity as it may well be equal to one for all nodes in the network. The situation is rather the opposite if we instead consider a DAG, as every random walk along the network’s edges comes to an end when a root node with zero out-degree is reached. In this Letter we introduce an analytical framework for DAGs to quantify the influence of one node over another based on the passage probability and discuss its applications. In particular we propose a method to identify papers fundamental to the growth of a given research area and define a new similarity metric. Relation to PageRank, which has been used to citation data before [14] (see [15] for a historical perspective of PageRank and other fields of its applicability), is also discussed. We test our framework on citation data provided by the American Physical Society and we show that: i) the proposed method is able to uncover seminal papers even if they do not have particularly high citation counts, (ii) the similarity metric performs well when used as a component of a simple recommendation algorithm [16]. Note that the time dimension, neglected by many information filtering techniques, is implicitly taken into account by acting on a DAG. While we use academic citation data to test our model and often refer to papers and citations instead of nodes and edges, majority of this work is general and applicable to other DAGs such as those representing family trees and reference networks of patents [17] and legal cases [18]. Consider a directed acyclic graph composed of N nodes and L directed edges pointing from newer to older nodes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visual Similarity Perception of Directed Acyclic Graphs: A Study on Influencing Factors

While visual comparison of directed acyclic graphs (DAGs) is commonly encountered in various disciplines (e.g., finance, biology), knowledge about humans’ perception of graph similarity is currently quite limited. By graph similarity perception we mean how humans perceive commonalities and differences in graphs and herewith come to a similarity judgment. As a step toward filling this gap the st...

متن کامل

Efficient Kernels for Sentence Pair Classification

In this paper, we propose a novel class of graphs, the tripartite directed acyclic graphs (tDAGs), to model first-order rule feature spaces for sentence pair classification. We introduce a novel algorithm for computing the similarity in first-order rewrite rule feature spaces. Our algorithm is extremely efficient and, as it computes the similarity of instances that can be represented in explici...

متن کامل

Similarity of Weighted Directed Acyclic Graphs

This thesis proposes a weighted DAG (wDAG) similarity algorithm for match-making in e-Business environments. We focus on the metadata representation of buyer and seller agents, as well as a similarity and associated simplicity measure over this information. In order to make the interaction between agents more meaningful and fine-grained, we choose node-labeled, arc-labeled and arc-weighted dire...

متن کامل

Interval Propagation on Directed Acyclic Graphs Interval Propagation and Search on Directed Acyclic Graphs for Numerical Constraint Solving

The fundamentals of interval analysis on directed acyclic graphs (DAGs) for global optimization and constraint propagation have recently been proposed by Schichl and Neumaier [2005]. For representing numerical problems, the authors use DAGs whose nodes are subexpressions and whose directed edges are computational flows. Compared to tree-based representations [Benhamou et al. 1999], DAGs offer t...

متن کامل

Multiclass SVM Classification Using Graphs Calibrated by Similarity between Classes

In this paper new learning structures, similarity between classes based trees and directed acyclic graph, are presented. The proposed structures are based on a distribution of recognized classes in a data space, unlike the known graph methods such as the tree based One– Against–All (OAA) algorithm or the directed acyclic graph based One– Against–One (OAO) algorithm. The structures are created b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1108.3691  شماره 

صفحات  -

تاریخ انتشار 2011